Estudio de un Conjunto de Datos que Representa Eventos Infrecuentes y la Viabilidad de la Distribución Poisson Para su Modelado

El presente trabajo se propone estudiar un conjunto de datos con los incidentes atendidos por el departamento de bomberos de una gran urbe contemporánea para ver hasta qué punto estos se adecuan a la idea inicial que podríamos de tener. A saber, que en su mayoría son eventos excepcionales y por tanto es tentador pensar que son bien modelados por la distribución del títutlo.

Siguiente el consejo de la docente, añadimos también el estudio a través de la distribución binomial negativa, que en su concepción heurística también se plantea como ideal para modelar esta clase de sucesos.

Preparamos los Datos para su Estudio

Cargamos algunas librerías útiles:

library(MASS, pos = "package:base") # WARNING: Import before tidyverse to keep dplyr's `select`
library(tidyverse)
library(lubridate, pos = "package:base")
library(glue)
library(here)
library(knitr)

Leemos el dataset, bajado de acá:

read_dataset <- function(fn) {
  read_csv(fn) %>%
  rename(
    incident_type = CFD_INCIDENT_TYPE_GROUP,
    incident_time = CREATE_TIME_INCIDENT) %>%
    # Reemplazamos NA con "NULL" para poder seleccionar los incident_type con
    # valor faltante más tarde:
    mutate(incident_type = map_chr(incident_type,
                                   ~ifelse(is.na(.x), "NULL", .x)))  
}

count_incidents_per_type <- function(df) {
  df %>% count(incident_type) %>% arrange(-n)
}

fn <- here("data/Cincinnati_Fire_Incidents__CAD___including_EMS__ALS_BLS_.csv")
full_incidents <- read_dataset(fn)
## Parsed with column specification:
## cols(
##   ADDRESS_X = col_character(),
##   LATITUDE_X = col_double(),
##   LONGITUDE_X = col_double(),
##   AGENCY = col_character(),
##   CREATE_TIME_INCIDENT = col_character(),
##   DISPOSITION_TEXT = col_character(),
##   EVENT_NUMBER = col_character(),
##   INCIDENT_TYPE_ID = col_character(),
##   INCIDENT_TYPE_DESC = col_character(),
##   NEIGHBORHOOD = col_character(),
##   ARRIVAL_TIME_PRIMARY_UNIT = col_character(),
##   BEAT = col_character(),
##   CLOSED_TIME_INCIDENT = col_character(),
##   DISPATCH_TIME_PRIMARY_UNIT = col_character(),
##   CFD_INCIDENT_TYPE = col_character(),
##   CFD_INCIDENT_TYPE_GROUP = col_character(),
##   COMMUNITY_COUNCIL_NEIGHBORHOOD = col_character()
## )
n_per_incident_type <- count_incidents_per_type(full_incidents)

# Uncomment to print the list of incident types and their frequency:
# kable(n_per_incident_type) 

Lo que nos interesa es separar las llamadas según su tipo y ver la frecuencia de cada tipo, para eso por lo pronto las contamos, considerando que el período de los registros abarca unos tres años.

Tabla con los Primeros 30 Tipos de Incidente y su Frecuencia

incident_type n
SICK PERSON 30798
BREATHING PROBLEMS 24009
FALLS 22481
PERSON DOWN 19529
AUTOMATIC FIRE ALARM 19507
MEDICAL EMERGENCY 14627
CHEST PAIN / CHEST DISCOMFORT (NON-TRAUMATIC) 14327
INFORMATION TELETYPE-NO DISPATCH 12682
UNCONSCIOUS / FAINTING (NEAR) 10468
CONVULSIONS / SEIZURES 9019
UNKNOWN PROBLEM (PERSON DOWN) 8358
ABDOMINAL PAIN / PROBLEMS 7462
ACCIDENT WITH INJURY - FIRE ONLY 7324
HEROIN OVERDOSE 7290
HEMORRHAGE / LACERATIONS 7150
ASSAULT WITH INJURY 6224
DIABETIC PROBLEMS 5205
ACCIDENT WITH INJURY 4329
STRUCTURE FIRE 4317
NULL 4054
OUTDOOR FIRE 3792
DETAIL/SPECIAL ASSIGNMENT 3528
STROKE (CVA) / TRANSIENT ISCHEMIC ATTACK (TIA) 3313
PREGNANCY / CHILDBIRTH / MISCARRIAGE 3200
TRAUMATIC INJURIES (SPECIFIC) 3095
HEART PROBLEM / A.I.C.D 2709
OVERDOSE / POSIONING (INGESTION) 2620
SUICIDE ATTEMPT 2611
FUMES 2526
SALVAGE 2514

Preparamos una Función que Analiza la Muestra para el Tipo de Evento Seleccionado Según Ambas Distribuciones

keep_incidents_of_type <- function(chosen_type) {
  full_incidents %>%
    select(incident_type, incident_time) %>%
    filter(incident_type == chosen_type) %>%
    select(incident_time) %>%
    mutate(incident_time = as.POSIXct(incident_time,
                                      format = "%m/%d/%Y %I:%M:%S %p",
                                      tz = "UTC")) %>%
    mutate(incident_day = floor_date(incident_time, "day")) -> individual_incidents_table
  return(
         list(table=individual_incidents_table, chosen_type=chosen_type) 
         )
}

# Esta función toma la tabla donde hay una entrada y fecha (redondeada) por evento y agrupa los eventos que ocurrieron en el mismo día. Además, añade los días de frecuencia 0 hallados a través de una simple diferencia entre las fechas extremas y la cantidad de días en la lista ingresada. 

get_freq_by_date_table  <- function(kept_incidents) {
  daily_incidents_count <-
    kept_incidents$table %>%
    count(incident_day) %>%
    select(n)

  # Agregamos los días sin llamadas (días que no aparecen en el dataset)
  # como filas de ceros:
  max_day <- max(kept_incidents$table$incident_day)
  min_day <- min(kept_incidents$table$incident_day)
  total_days <- max(as.integer(max_day - min_day), nrow(daily_incidents_count))
  missing_days <- total_days - nrow(daily_incidents_count)
  
  daily_incidents_count <-
    rbind(tibble(n = rep(0, missing_days)), daily_incidents_count) %>%
    mutate(n = as.integer(n))

  # Frecuencia de frecuencias: contamos cuántos días tienen 0 llamadas,
  # cuántos días tienen 1 llamada, ...
  daily_incidents_freq <-
    daily_incidents_count %>%
      count(n) %>%
      rename(count = nn) %>%
      mutate(
        density = count / nrow(daily_incidents_count),
        density_type = "empirical") %>%
      select(n, density_type, density)

  # Safety check:
  assertthat::assert_that((near(sum(daily_incidents_freq$density), 1)))
  return( list(
               freq=daily_incidents_freq,
               count=daily_incidents_count
               )
  )
}

build_chosen_incident_dataframe  <- function(chosen_type) {
  frame_by_date  <- keep_incidents_of_type(chosen_type)
  frame_by_freq <- get_freq_by_date_table(frame_by_date)

  # Asumiendo distribución Poisson, estimamos el parámetro como la media muestral:

  frame_by_freq["lambda"]  <- mean(frame_by_freq$count$n)


  return( frame_by_freq )
}

compare_poisson_and_bn_for_incident_type <- function(chosen_incident_type, graphic = T) {
  chosen_incident_dataframe  <- build_chosen_incident_dataframe(chosen_incident_type)
  daily_incidents_count  <- chosen_incident_dataframe$count
  daily_incidents_freq  <- chosen_incident_dataframe$freq

  

  lambda_estimate <- chosen_incident_dataframe$lambda

  # Ajustamos el número de incidentes diario con la distribución Binomial Negativa
  # para comparar con la Poisson:
  fit <- fitdistr(daily_incidents_count$n, densfun = "negative binomial")
  bn_size_estimate <- fit$estimate[["size"]]
  # Obs: El parámetro "mu" de la parametrización alternativa de la distribución
  # BN se estima igual que el lambda de Poisson, con la media muestral.
  bn_mu_estimate <- lambda_estimate

  max_n <- max(daily_incidents_count$n)
  min_n <- min(daily_incidents_count$n)

  # Juntamos las densidades esperadas por Poisson y BN para cada n observado
  # para luego comparar con la densidad empírica:
  compute_density <- function(n, density_type, density_function) {
    if (density_type == "poisson") {
      density_function(n, lambda = lambda_estimate)
    } else if (density_type == "negative_binomial") {
      density_function(n, size = bn_size_estimate, mu = bn_mu_estimate)
    } else {
      stop(glue("I don't know how to compute density for type: {density_type}"))
    }
  }
  
  probabilities_per_n <-
    cross_df(list(
    n = seq(min_n, max_n),
    density_type = c("poisson", "negative_binomial"))) %>%
    mutate(
      density_function = map(density_type,
                             ~ifelse(.x == "poisson", dpois, dnbinom)),
      density = pmap_dbl(list(n, density_type, density_function),
                         compute_density)) %>%
    select(n, density_type, density)
  
  probabilities_per_n <- rbind(probabilities_per_n, daily_incidents_freq)
  
  # Sumamos las curvas de densidad Poisson y BN al gráfico de la empírica:
  poisson_color <- "ForestGreen"
  bn_color <- "IndianRed"
  empirical_color <- "#444444"
 
  if (graphic == T ) {
    fig <-
      probabilities_per_n %>%
      ggplot() +
      aes(x = n, y = density, color = density_type) +
      geom_point(alpha = .7) +
      geom_line() +
      scale_color_manual(values = c(empirical_color, bn_color, poisson_color)) +
      labs(
        title = glue("Número de '{chosen_incident_type}' por día"),
        subtitle = "Ajuste de la distribución empírica con Poisson y Binomial Negativa",
        y = "Densidad",
        x = glue("Número de llamadas"))
    return (list(
      probabilities_per_n = probabilities_per_n,
      fig = fig,
      lambda_estimate = lambda_estimate
    ))
  } 

  return (list(
    probabilities_per_n = probabilities_per_n,
    lambda_estimate = lambda_estimate
  ))
  
  
}

Corremos la comparación para todos los tipos de incidentes del dataset:

### Para generar los gráficos, cambiar la variable fija "generate_graphics"

distribuciones_de_incidentes_largas <- list()

for (incident_type in n_per_incident_type$incident_type) {
  generate_graphics <- T
  print(glue("Working with: {incident_type}"))
  comparison <- compare_poisson_and_bn_for_incident_type(incident_type, generate_graphics)
  
  # Sanitize the incident type for the plot filename:
  name <- str_to_lower(incident_type)

  name <- str_replace_all(name, "/", "-")
  
  distribuciones_de_incidentes_largas[[name]] <- list(nombre = name, tabla = comparison$probabilities_per_n, lambda = comparison$lambda_estimate)
                                           
 
  if (generate_graphics) {
    filename <- here(glue("results/{name}.png"))
    ggsave(filename, comparison$fig, width = 10, height = 6)
    

    print(comparison$fig)
  }

}
## Working with: SICK PERSON
## Working with: BREATHING PROBLEMS

## Working with: FALLS

## Working with: PERSON DOWN

## Working with: AUTOMATIC FIRE ALARM

## Working with: MEDICAL EMERGENCY

## Working with: CHEST PAIN / CHEST DISCOMFORT (NON-TRAUMATIC)

## Working with: INFORMATION TELETYPE-NO DISPATCH

## Working with: UNCONSCIOUS / FAINTING (NEAR)

## Working with: CONVULSIONS / SEIZURES

## Working with: UNKNOWN PROBLEM (PERSON DOWN)

## Working with: ABDOMINAL PAIN / PROBLEMS

## Working with: ACCIDENT WITH INJURY - FIRE ONLY

## Working with: HEROIN OVERDOSE

## Working with: HEMORRHAGE / LACERATIONS

## Working with: ASSAULT WITH INJURY

## Working with: DIABETIC PROBLEMS

## Working with: ACCIDENT WITH INJURY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: STRUCTURE FIRE

## Working with: NULL
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: OUTDOOR FIRE

## Working with: DETAIL/SPECIAL ASSIGNMENT

## Working with: STROKE (CVA) / TRANSIENT ISCHEMIC ATTACK (TIA)

## Working with: PREGNANCY / CHILDBIRTH / MISCARRIAGE

## Working with: TRAUMATIC INJURIES (SPECIFIC)

## Working with: HEART PROBLEM / A.I.C.D

## Working with: OVERDOSE / POSIONING (INGESTION)

## Working with: SUICIDE ATTEMPT

## Working with: FUMES
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: SALVAGE
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: FIRE ADVISED - NO DISPATCH

## Working with: WIRES DOWN/ARCING
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: BACK PAIN (NON-TRAUMATIC OR NON-RECENT TRAUMA)

## Working with: FIRE DRILL - NO RESPONSE

## Working with: SHOOTING HAS OCCURRED

## Working with: VEHICLE FIRE

## Working with: LOCK IN/LOCK OUT

## Working with: CARDIAC OR RESPIRATORY ARREST / DEATH
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: HEADACHE

## Working with: WALKIN AT FIRE STATIOIN

## Working with: ALLERGIES (REASTIONS) / ENVENOMATIONS (STINGS, BITES)

## Working with: CARBON MONOXIDE ALARM
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: ENTRAPMENT - AUTO ACCIDENT
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: STUCK IN ELEVATOR

## Working with: STILL ALARM

## Working with: POSSIBLE DOA

## Working with: CHEST PAIN/CHEST DISCOMFORT (NON-TRAUMATIC)
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: INVESTIGATION
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: DOMESTIC VIOLENCE WITH INJURY

## Working with: FIRE HYDRANT LEAKING OR STRUCK

## Working with: TRAFFIC / TRANSPORTATION INCIDENTS
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: NOT USED

## Working with: AUTO ACCIDENT - CAR HIT BUILDING

## Working with: CUTTING - FIRE ONLY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: ROBBERY WITH INJURY

## Working with: CHOKING

## Working with: TASING - POLICE REQUEST

## Working with: CUTTING
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: ASSAULT / SEXUAL ASSAULT / STUN GUN
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: GAS SPILL - MINOR < 25 GALLONS

## Working with: ANIMAL BITES / ATTACKS

## Working with: EYE PROBLEMS / INJURIES
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: FIRE REPORTED OUT

## Working with: SWAT OPERATION
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: BURNS (SCALDS) / EXPLOSION (BLAST)

## Working with: OUTLET SPARKING

## Working with: FIRE EQUIPMENT INVOLVED IN ACCIDENT

## Working with: POLICE OFFICER NEEDS ASSISTANCE

## Working with: CARBON MONOXIDE ILL/WITH SYMPTOMS

## Working with: HEAT / COLD EXPOSURE
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: BOMB REMOVAL

## Working with: RIVER EMERGENCY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: TRUCK FIRE
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: ENTRAPMENT - MECHANICAL/FIRE ONLY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: MUTUAL AID REQEST
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: PSYCHIATRIC / ABNORMAL BEHAVIOR / SUICIDE ATTEMPT

## Working with: STAB / GUNSHOT / PENETRATING TRAUMA
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: TEST

## Working with: INACTIVITY ALARM

## Working with: RAPE WITH INJURY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: INJURY - POLICE REQUEST
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: ELECTROCUTION / LIGHTNING

## Working with: FIREFIGHTER NEEDS ASSISTANCE
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: RAPE

## Working with: CHECMICAL INVESTIGATION

## Working with: STRUCTURAL OR TRENCH COLLAPSE
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: AIR CRAFT IN TROUBLE OR DOWN
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: GAS SPILL - MAJOR > 25 GALLONS

## Working with: BIOLOGICAL/CHEMCIAL THREAT
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: DROWNING - FIRE ONLY

## Working with: DROWNING - LARGE BODY OF WATER
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: INACCESSIBLE INCIDENT / OTHER ENTRAPMENTS (NON-TRAFFIC)

## Working with: DROWNING / NEAR DROWNING / DIVING / SCUBA ACCIDENT
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: CHEMICAL SPILL

## Working with: HIGH RISK SEARCH WARRANT
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: CARBON MONOXIDE /INHALATION / HAZMAT / CBRN

## Working with: TRANSFER CALL

## Working with: WATER RESCUE TRIBUTARY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: GREATER CINCINNATI AIRPORT CRASH
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: CARDIAC OR RESPIRATORY ARREST/DEATH

## Working with: BOAT FIRE ON RIVER

## Working with: FIRE SERVICE - NO DISPATCH/ NOT USED

## Working with: BIOHAZARD ALARM AT POST OFFICE

## Working with: HEMORRHAGE/LACERATIONS

saveRDS(distribuciones_de_incidentes_largas,"incidentes.rds")

Comparamos el Costo de Ambas Predicciones Respecto a la Distribución Empírica

En realidad para ser totalmente confiable este estudio necesita algún tipo de bootsrap como k-pliegues para evitar un sesgo hacia la confirmación pero consideramos que para los fines de este vistazo no es necesario.

distribuciones_de_incidentes_largas %>% 
  map(~.$tabla) %>%
  map(~spread(.,density_type,density)) %>%
    map(
            ~mutate( . , error_p = (poisson - empirical)^2,
                    error_bn = (negative_binomial - empirical)^2,
                    )
            ) -> distribuciones_de_incidentes
  

obtener_varianzas <- function(tabla) {
  v_poisson <- sum(tabla$error_p)/nrow(tabla) 
  v_binomial_neg <- sum(tabla$error_bn)/nrow(tabla) 
  ganadora  <- ifelse(v_poisson <= v_binomial_neg,"poisson","binomial negativa : (")
  return(list(ganadora=ganadora,varianza_poisson=v_poisson, varianza_binomial_negativa=v_binomial_neg))
}

distribuciones_de_incidentes %>% map(obtener_varianzas)  -> varianza_por_caso 

son_poisson  <- which(map( varianza_por_caso, ~.$ganadora == "poisson" ) == 1 )
son_pascal  <- which(map( varianza_por_caso, ~.$ganadora == "binomial negativa : (" ) == 1 )

Los eventos que tienen un comportamiento mejor descrito por la distribución de Poisson son los siguientes:

x
pregnancy - childbirth - miscarriage 24
back pain (non-traumatic or non-recent trauma) 33
allergies (reastions) - envenomations (stings, bites) 41
choking 56
fire reported out 63
burns (scalds) - explosion (blast) 65
carbon monoxide ill-with symptoms 69
psychiatric - abnormal behavior - suicide attempt 76
electrocution - lightning 82
rape 84
checmical investigation 85
chemical spill 94
carbon monoxide -inhalation - hazmat - cbrn 96
cardiac or respiratory arrest-death 100
boat fire on river 101
fire service - no dispatch- not used 102
biohazard alarm at post office 103
hemorrhage-lacerations 104

Muchos más eventos tuvieron distribución binomial negativa (o fueron mejor explicados por ella)

x
falls 3
chest pain - chest discomfort (non-traumatic) 7
unconscious - fainting (near) 9
unknown problem (person down) 11
hemorrhage - lacerations 15
diabetic problems 17
stroke (cva) - transient ischemic attack (tia) 23
traumatic injuries (specific) 25
heart problem - a.i.c.d 26
overdose - posioning (ingestion) 27
suicide attempt 28
shooting has occurred 35
cardiac or respiratory arrest - death 38
walkin at fire statioin 40
carbon monoxide alarm 42
stuck in elevator 44
possible doa 46
chest pain-chest discomfort (non-traumatic) 47
investigation 48
domestic violence with injury 49
traffic - transportation incidents 51
not used 52
auto accident - car hit building 53
cutting - fire only 54
tasing - police request 57
gas spill - minor < 25 gallons 60
animal bites - attacks 61
eye problems - injuries 62
outlet sparking 66
fire equipment involved in accident 67
police officer needs assistance 68
heat - cold exposure 70
bomb removal 71
river emergency 72
truck fire 73
mutual aid reqest 75
stab - gunshot - penetrating trauma 77
inactivity alarm 79
rape with injury 80
injury - police request 81
firefighter needs assistance 83
structural or trench collapse 86
air craft in trouble or down 87
gas spill - major > 25 gallons 88
biological-chemcial threat 89
drowning - fire only 90
drowning - large body of water 91
inaccessible incident - other entrapments (non-traffic) 92
drowning - near drowning - diving - scuba accident 93
high risk search warrant 95
transfer call 97
water rescue tributary 98

Trabajamos Con el Parámetro Lambda

distribuciones_de_incidentes_largas %>% map_dbl(~.$lambda) -> vector_lambdas 

lambdas <- tibble(nombre = names(vector_lambdas),lambda=unname(vector_lambdas))          

lambdas %>% ggplot() + 
  aes(x=lambda) +
  geom_histogram(binwidth=1) 

lambdas %>% mutate( distribucion = rep("empírica") ) %>% select (lambda,distribucion)  -> lambdas_larga 

min_lambda <- min(lambdas_larga$lambda)
max_lambda <- max(lambdas_larga$lambda)

nro_de_intervalos <- 23


lambdas_larga %>% mutate(Intervalo = cut(lambda,breaks= nro_de_intervalos,include.lowest=F, ordered = T ) ) %>% 
  group_by(Intervalo) %>%
  summarise(Frecuencia_emp = n()/nrow(lambdas_larga) ) -> lambdas_freq

#Necesitamos obtener los valores centrales de cada intervalo. Esto es sorprendentemente trabajoso.


cbind(inferior = as.numeric( sub("\\((.+),.*", "\\1", lambdas_freq$Intervalo) ),
      superior = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", lambdas_freq$Intervalo) )) %>%
as.tibble -> extremos

lambdas_agrupada  <- bind_cols(lambdas_freq,extremos)

lambdas_agrupada %>% mutate(lambda_centro=(superior-inferior)/2+inferior) -> lambdas_agrupada

distribucion_lambda <- fitdistr(lambdas$lambda,"exponential")
distribucion_gamma <- fitdistr(lambdas$lambda,"gamma")

forma_g <- distribucion_gamma$estimate[1]
tasa_g <- distribucion_gamma$estimate[2]

lambdas_agrupada %>% mutate(
                            Frecuencia_estimada_exp = dexp(lambda_centro,rate=distribucion_lambda$estimate),
                            Frecuencia_estimada_gamma = dgamma(x=lambda_centro,shape=forma_g,rate=tasa_g)) -> lambdas_agrupada

lambdas_larga <- gather(lambdas_agrupada,Frecuencia_emp,Frecuencia_estimada_exp,Frecuencia_estimada_gamma,key="Origen",value="Frec")

ggplot(data = lambdas_larga, mapping = aes(x=lambda_centro,y=Frec, color = Origen )) +
  geom_line()